Topic detection with recursive consensus clustering and semantic enrichment

نویسندگان

چکیده

Abstract Extracting meaningful information from short texts like tweets has proved to be a challenging task. Literature on topic detection focuses mostly methods that try guess the plausible words describe topics whose number been decided in advance. Topics change according initial setup of algorithms and show consistent instability with moving one another one. In this paper we propose an iterative procedure for searches most stable solutions terms describing topic. We use based clustering consensus matrix, traditional detection, find both set optimal topics. observe however several cases does not converge unique value but oscillates. further enhance methodology using semantic enrichment via Word Embedding aim reducing noise improving separation. foresee application techniques automatic discovery noisy channels such as Twitter or social media.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text clustering for topic detection

The world wide web represents vast stores of information. However, the sheer amount of such information makes it practically impossible for any human user to be aware of much of it. Therefore, it would be very helpful to have a system that automatically discovers relevant, yet previously unknown information, and reports it to users in human-readable form. As the first attempt to accomplish such...

متن کامل

Improving semantic topic clustering for search queries with word co-occurrence and bigraph co-clustering

Uncovering common themes from a large number of unorganized search queries is a primary step to mine insights about aggregated user interests. Common topic modeling techniques for document modeling often face sparsity problems with search query data as these are much shorter than documents. We present two novel techniques that can discover semantically meaningful topics in search queries: i) wo...

متن کامل

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

ahp algorithm and un-supervised clustering in auto insurance fraud detection

this thesis is a study on insurance fraud in iran automobile insurance industry and explores the usage of expert linkage between un-supervised clustering and analytical hierarchy process(ahp), and renders the findings from applying these algorithms for automobile insurance claim fraud detection. the expert linkage determination objective function plan provides us with a way to determine whi...

15 صفحه اول

Consensus Clustering + Meta Clustering = Multiple Consensus Clustering

Consensus clustering and meta clustering are two important extensions of the classical clustering problem. Given a set of input clusterings of a given dataset, consensus clustering aims to find a single final clustering which is a better fit in some sense than the existing clusterings, and meta clustering aims to group similar input clusterings together so that users only need to examine a smal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Humanities & social sciences communications

سال: 2023

ISSN: ['2662-9992']

DOI: https://doi.org/10.1057/s41599-023-01711-0